Photorealistic style transfer aims to transfer the artistic style of an image onto an input image or video while keeping photorealism. In this paper, we think it's the summary statistics matching scheme in existing algorithms that leads to unrealistic stylization. To avoid employing the popular Gram loss, we propose a self-supervised style transfer framework, which contains a style removal part and a style restoration part. The style removal network removes the original image styles, and the style restoration network recovers image styles in a supervised manner. Meanwhile, to address the problems in current feature transformation methods, we propose decoupled instance normalization to decompose feature transformation into style whitening and restylization. It works quite well in ColoristaNet and can transfer image styles efficiently while keeping photorealism. To ensure temporal coherency, we also incorporate optical flow methods and ConvLSTM to embed contextual information. Experiments demonstrates that ColoristaNet can achieve better stylization effects when compared with state-of-the-art algorithms.
translated by 谷歌翻译
Accurate localization ability is fundamental in autonomous driving. Traditional visual localization frameworks approach the semantic map-matching problem with geometric models, which rely on complex parameter tuning and thus hinder large-scale deployment. In this paper, we propose BEV-Locator: an end-to-end visual semantic localization neural network using multi-view camera images. Specifically, a visual BEV (Birds-Eye-View) encoder extracts and flattens the multi-view images into BEV space. While the semantic map features are structurally embedded as map queries sequence. Then a cross-model transformer associates the BEV features and semantic map queries. The localization information of ego-car is recursively queried out by cross-attention modules. Finally, the ego pose can be inferred by decoding the transformer outputs. We evaluate the proposed method in large-scale nuScenes and Qcraft datasets. The experimental results show that the BEV-locator is capable to estimate the vehicle poses under versatile scenarios, which effectively associates the cross-model information from multi-view images and global semantic maps. The experiments report satisfactory accuracy with mean absolute errors of 0.052m, 0.135m and 0.251$^\circ$ in lateral, longitudinal translation and heading angle degree.
translated by 谷歌翻译
安全的基于多方计算的机器学习(称为MPL)已成为利用来自具有隐私保护的多个政党的数据的重要技术。尽管MPL为计算过程提供了严格的安全保证,但MPL训练的模型仍然容易受到仅依赖于访问模型的攻击。差异隐私可以帮助防御此类攻击。但是,差异隐私和安全多方计算协议的巨大沟通开销带来的准确性损失使得平衡隐私,效率和准确性之间的三通权衡是高度挑战的。在本文中,我们有动力通过提出一种解决方案(称为PEA(私有,高效,准确))来解决上述问题,该解决方案由安全的DPSGD协议和两种优化方法组成。首先,我们提出了一个安全的DPSGD协议,以在基于秘密共享的MPL框架中强制执行DPSGD。其次,为了减少因差异隐私噪声和MPL的巨大通信开销而导致的准确性损失,我们提出了MPL训练过程的两种优化方法:(1)与数据无关的功能提取方法,旨在简化受过训练的模型结构体; (2)基于本地数据的全局模型初始化方法,旨在加快模型训练的收敛性。我们在两个开源MPL框架中实施PEA:TF-Conteded和Queqiao。各种数据集的实验结果证明了PEA的效率和有效性。例如。当$ {\ epsilon} $ = 2时,我们可以在LAN设置下的7分钟内训练CIFAR-10的差异私有分类模型,其精度为88%。这一结果大大优于来自CryptGPU的一个SOTA MPL框架:在CIFAR-10上训练非私有性深神经网络模型的成本超过16小时,其精度相同。
translated by 谷歌翻译
最近几天,流媒体技术极大地促进了直播领域的发展。由于直播记录的长度过多,因此提取突出显示细分市场至关重要,以有效地生殖和重新分布。尽管事实证明,有很多方法可以有效地检测其他模式,但直播处理中存在的挑战,例如极端持续时间,大主题转移,无关紧要的信息等等,因此严重阻碍了这些这些的适应性和兼容性方法。在本文中,我们制定了一个新的任务直播突出显示检测,讨论和分析上面列出的困难,并提出了一种新的建筑抗议,以解决此问题。具体而言,我们首先将原始数据编码为多个视图,并对其时间关系进行建模,以捕获层次注意机制中的线索。之后,我们尝试将突出显示剪辑的检测转换为搜索最佳决策序列的搜索,并使用完全集成的表示形式来预测动态编程机制中的最终结果。此外,我们构建了一个完全注重的数据集Anthighlight,以实例化此任务并评估模型的性能。广泛的实验表明我们提出的方法的有效性和有效性。
translated by 谷歌翻译
大规模数据集对于学习良好的特性至关重要,以便在3D形状理解中,只有几个数据集可以满足深入学习培训。其中一个主要原因是,用于使用多边形或涂鸦注释每点语义标签的当前工具是乏味的,效率低下。为了促进3D形状中的分段注释,我们提出了一个有效的注释工具,名为3D形状的ISEG。它可以获得最小的人类点击(<10)的满足细分结果。在我们的观察下,大多数物体可以被视为有限原始形状的组成,我们在我们的建立原始组合的形状数据上培训ISEG3D模型,以以自我监督的方式学习几何先前知识。给定人类交互,所学的知识可用于在任意形状上分段部分,其中正点击帮助将基元与语义部件相关联,负击可以避免过分分割。此外,我们还提供了一个在线人体环路的微调模块,使模型能够使用较少点击执行更好的分段。实验证明ISEG3D对Partnet形状分割的有效性。数据和代码将公开可用。
translated by 谷歌翻译
铰接物对象在日常生活中普遍存在。然而,由于内在的高DOF结构,铰接物的关节状态很难估计。为了模拟铰接物体,应考虑两种形状变形即几何和姿势变形。在这项工作中,我们提出了一种具有铰接变形(OMAD)的特定于对象模型的专用类别参数表示,以显式模拟铰接对象。在OMAD中,类别与具有共享形状的线性形状函数与非线性接合功能相关联。这两个函数都可以从大型对象模型数据集中学习并固定为特定于类别的前瞻。然后,我们提出了一个OMADNet,以预测来自对象的单个观察的形状参数和关节状态。通过对象形状和联合状态的完整表示,我们可以解决多种任务,包括类别级对象姿势估计和铰接对象检索。为了评估这些任务,我们根据Partnet-Mobility创建一个合成数据集。广泛的实验表明,我们的简单OMADNet可以作为两个任务的强基线。
translated by 谷歌翻译
建设通用机器人在人类水平的各种环境中对大量的任务进行众所周知的复杂。它需要机器人学习是采样的,更概括的,可概括的,组成和增量。在这项工作中,我们介绍了一个称为SAGCI-System的系统学习框架,实现了超过四种要求。我们的系统首先采用由安装在机器人手腕上的摄像机收集的原始点云作为输入,并产生所代表为URDF的周围环境的初始建模。我们的系统采用了一个加载URDF的学习增强的可分辨率模拟。然后,机器人利用交互式感知来与环境交互,并修改URDF。利用模拟,我们提出了一种新的基于模型的RL算法,这些RL算法结合了以上的对象和机器人为中心的方法,以有效地产生完成操纵任务的策略。我们应用我们的系统,以进行仿真和现实世界的铰接物体操纵。广泛的实验表明了我们提出的学习框架的有效性。 https://sites.google.com/view/egci提供了补充材料和视频。
translated by 谷歌翻译
Existing knowledge graph (KG) embedding models have primarily focused on static KGs. However, real-world KGs do not remain static, but rather evolve and grow in tandem with the development of KG applications. Consequently, new facts and previously unseen entities and relations continually emerge, necessitating an embedding model that can quickly learn and transfer new knowledge through growth. Motivated by this, we delve into an expanding field of KG embedding in this paper, i.e., lifelong KG embedding. We consider knowledge transfer and retention of the learning on growing snapshots of a KG without having to learn embeddings from scratch. The proposed model includes a masked KG autoencoder for embedding learning and update, with an embedding transfer strategy to inject the learned knowledge into the new entity and relation embeddings, and an embedding regularization method to avoid catastrophic forgetting. To investigate the impacts of different aspects of KG growth, we construct four datasets to evaluate the performance of lifelong KG embedding. Experimental results show that the proposed model outperforms the state-of-the-art inductive and lifelong embedding baselines.
translated by 谷歌翻译
This paper introduces a new few-shot learning pipeline that casts relevance ranking for image retrieval as binary ranking relation classification. In comparison to image classification, ranking relation classification is sample efficient and domain agnostic. Besides, it provides a new perspective on few-shot learning and is complementary to state-of-the-art methods. The core component of our deep neural network is a simple MLP, which takes as input an image triplet encoded as the difference between two vector-Kronecker products, and outputs a binary relevance ranking order. The proposed RankMLP can be built on top of any state-of-the-art feature extractors, and our entire deep neural network is called the ranking deep neural network, or RankDNN. Meanwhile, RankDNN can be flexibly fused with other post-processing methods. During the meta test, RankDNN ranks support images according to their similarity with the query samples, and each query sample is assigned the class label of its nearest neighbor. Experiments demonstrate that RankDNN can effectively improve the performance of its baselines based on a variety of backbones and it outperforms previous state-of-the-art algorithms on multiple few-shot learning benchmarks, including miniImageNet, tieredImageNet, Caltech-UCSD Birds, and CIFAR-FS. Furthermore, experiments on the cross-domain challenge demonstrate the superior transferability of RankDNN.The code is available at: https://github.com/guoqianyu-alberta/RankDNN.
translated by 谷歌翻译
Incorporating large-scale pre-trained models with the prototypical neural networks is a de-facto paradigm in few-shot named entity recognition. Existing methods, unfortunately, are not aware of the fact that embeddings from pre-trained models contain a prominently large amount of information regarding word frequencies, biasing prototypical neural networks against learning word entities. This discrepancy constrains the two models' synergy. Thus, we propose a one-line-code normalization method to reconcile such a mismatch with empirical and theoretical grounds. Our experiments based on nine benchmark datasets show the superiority of our method over the counterpart models and are comparable to the state-of-the-art methods. In addition to the model enhancement, our work also provides an analytical viewpoint for addressing the general problems in few-shot name entity recognition or other tasks that rely on pre-trained models or prototypical neural networks.
translated by 谷歌翻译